{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "### License\n",
    "Copyright 2021 Google LLC\n",
    "\n",
    "Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at\n",
    "\n",
    "  http://www.apache.org/licenses/LICENSE-2.0\n",
    "\n",
    "Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "# Analysis Result\n",
    "\n",
    "### Get the tables with highest discrepancy on write vs read frequency throughout the data warehouse\n",
    "This will list down tables with the highest discrepancy on write vs read frequency."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "source": [
    "import src.pipeline_analysis as pipeline_analysis\n",
    "import ipywidgets as widgets\n",
    "from IPython.display import display\n",
    "import pandas as pd\n",
    "\n",
    "limited_imbalance_tables = []\n",
    "def get_limited_imbalance_tables_df(limit):\n",
    "    global limited_imbalance_tables\n",
    "    limited_imbalance_tables_df = pipeline_analysis.get_tables_read_write_frequency_df(limit)\n",
    "    limited_imbalance_tables = limited_imbalance_tables_df['Table'].tolist()\n",
    "    return limited_imbalance_tables_df\n",
    "\n",
    "limited_imbalance_tables_df = get_limited_imbalance_tables_df(5)\n",
    "display(limited_imbalance_tables_df)"
   ],
   "outputs": [
    {
     "output_type": "display_data",
     "data": {
      "text/plain": [
       "                                               Table  Read Frequency  \\\n",
       "0              data-analytics-pocs.atos.temp_updates               0   \n",
       "1        data-analytics-pocs.atos.max_timestamp_temp               0   \n",
       "2           data-analytics-pocs.atos.identity_4m_new               0   \n",
       "3  data-analytics-pocs._c4108c8641cb849fa23d12f19...               0   \n",
       "4      data-analytics-pocs.public.bigquery_audit_log               0   \n",
       "\n",
       "   Write Frequency  \n",
       "0            12613  \n",
       "1             8131  \n",
       "2             7950  \n",
       "3             2057  \n",
       "4             1698  "
      ],
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Table</th>\n",
       "      <th>Read Frequency</th>\n",
       "      <th>Write Frequency</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>data-analytics-pocs.atos.temp_updates</td>\n",
       "      <td>0</td>\n",
       "      <td>12613</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>data-analytics-pocs.atos.max_timestamp_temp</td>\n",
       "      <td>0</td>\n",
       "      <td>8131</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>data-analytics-pocs.atos.identity_4m_new</td>\n",
       "      <td>0</td>\n",
       "      <td>7950</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>data-analytics-pocs._c4108c8641cb849fa23d12f19...</td>\n",
       "      <td>0</td>\n",
       "      <td>2057</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>data-analytics-pocs.public.bigquery_audit_log</td>\n",
       "      <td>0</td>\n",
       "      <td>1698</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     },
     "metadata": {}
    }
   ],
   "metadata": {
    "scrolled": true
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### Get the pipeline graph data of the table\n",
    "This will generate a pipeline graph file, in HTML format, under `pipeline_graph` directory. It may take sometime for this to run and generate."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "source": [
    "def visualise_table_pipelines(table):\n",
    "    pipeline_analysis.display_pipelines_of_table(table)\n",
    "\n",
    "visualise_table_pipelines(\"data-analytics-pocs.public.bigquery_audit_log\")"
   ],
   "outputs": [],
   "metadata": {
    "scrolled": true
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### Display the pipeline graph of the table\n",
    "Display the pipeline graph of the table. The thickness of the edges indicates the frequency compared to the rest of the edges in the current graph.\n",
    "\n",
    "The result will be something like this, when you run the notebook, you will be able to click on the different nodes of the graph, each representing different tbales that are part of the pipeline of this table of interest. When you click on a node, it will display more information for this table.\n",
    "\n",
    "![](assets/pipeline-example.gif)"
   ],
   "metadata": {}
  }
 ],
 "metadata": {
  "environment": {
   "name": "common-cpu.m76",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/base-cpu:m76"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}